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Abstract 

In this paper, we study the robust subspace clustering 
problem, which aims to cluster the given possibly noisy data 
points into their underlying subspaces. A large pool of pre¬ 
vious subspace clustering methods focus on the graph con¬ 
struction by different regularization of the representation 
coefficient. We instead focus on the robustness of the model 
to non-Gaussian noises. We propose a new robust clustering 
method by using the correntropy induced metric, which is 
robust for handling the non-Gaussian and impulsive noises. 
Also we further extend the method for handling the data 
with outlier rows/features. The multiplicative form of half- 
quadratic optimization is used to optimize the noil-convex 
correntropy objective function of the proposed models. Ex¬ 
tensive experiments on face datasets well demonstrate that 
the proposed methods are more robust to corruptions and 
occlusions. 

1. Introduction 

In pattern recognition and computer vision community, 
the data usually follow certain type of simple structure that 
enables intelligent representation. The subspaces are possi¬ 
bly the most widely used data model, since many real-world 
data, such as face images and motions, can be well charac¬ 
terized by subspaces. Given a set of data points, assuming 
that they are drawn from multiple subspaces, the goal of 
subspace clustering is to (1) cluster these data points into 
clusters with each cluster corresponding to a subspace, and 
(2) predict the memberships of the subspaces, including the 
number of subspaces and the basis of each subspace. Sub¬ 
space clustering is a fundamental problem and has numer¬ 
ous applications in the machine learning and computer vi¬ 
sion literature, e.g. motion segmentation EO and image 
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Figure 1. Face images belonging to different subjects lie in differ¬ 
ent subspaces. Noises by and corruptions deviate the data from the 
underlying subspaces. 

clustering 03. The challenge in these applications lies in 
that the only known information is the data points, and they 
are usually contaminated by various noises. Figure [T| illus¬ 
trates some face images from three subjects. The face im¬ 
ages with pixel corruption, sunglasses and/or scarf, deviate 
from their underlying subspaces. In this case, the subspace 
clustering is challenging. This paper aims to address the ro¬ 
bust subspace clustering problem with various noises, such 
as the non-Gaussian noises. 

1.1. Summary of Main Notations 

In this work, matrices are represented with capital sym¬ 
bols. In particular, I denotes the identity matrix. For a ma¬ 
trix M, Mij and ( M)ij denote its (i, j)-th entry. M l is its 
Z-th row, and Mj is its j-th column. Diag(-u) converts the 
vector v into a diagonal matrix in which the i-th diagonal 
entry is w. R + denotes the set of non-negative real values 
and §l xn denote the set of positive semi-definite matrices. 
My 0 denotes that M is symmetric and positive definite. 
C l denotes the set of continuous first derivative functions. 



(|t >||2 and 11 v | |oo denote the L2 norm and infinity norm 
of vector v, respectively. LI norm, L21 norm and nuclear 
norm of matrix M are defined as ||M||i = E 7 

\\M\\ 2 i = Ej \\ M i\h’ and \\ M \\* = E;<A (<R is the i- 

th singular value of M), respectively. 

1.2. Related Work 

Many subspace clustering methods have been proposed 
ED Q3] 00. In this work, we focus on the recent graph 
based subspace clustering methods lf3l l4l fTTl ITOl fl3ll , These 
methods are based on the spectral clustering, and its first 
step aims to construct an affinity (or graph) matrix which 
is close to be block diagonal, with zero elements corre¬ 
sponding to data pair from different subspaces. After the 
affinity matrix is learned, the Normalized Cut Il20l is em¬ 
ployed to segment the data into multiple clusters. For a 
given data matrix X £ R dxn , where d denotes the fea¬ 
ture dimension and n is the number of data points, the 
most recent methods, including LI-graph El or Sparse 
Subspace Clustering (SSC) |4|. Low-Rank Representation 
(LRR) I I 1 not Multi-Subspace Representation (MSR) fl4l 
and Least Squares Representation (LSR) m learn the 
affinity matrix Z £ R'" x ” by solving the following com¬ 
mon problem 

mm£(X - XZ) + \K(Z). (1) 


In this work, we show that the LI norm, L21 norm and nu¬ 
clear norm all satisfy certain conditions, and thus the previ¬ 
ous subspace clustering methods, including SSC, LRR and 
MSR, can be unified within a general framework from the 
perspective of half-quadratic optimization 1171 . The rela¬ 
tionship between the general framework and the previous 
optimization methods for sparse and low rank minimization 
is also presented in this work. 

Different from the previous methods which focus on a 
regularization term 1Z(Z), this work focuses on the con¬ 
struction error term C(Z) for robust subspace learning. Pre¬ 
vious works use the Frobenius norm to measure the quality 
of approximation, which is optimal for the case of indepen¬ 
dent and identically distributed (i.i.d.) Gaussian noise but 
not robust to outliers. LRR by using the L21 norm is able to 
remove the outlier samples, but it is sensitive to the outlier 
features. To overcome the weakness of mean squared error, 
we propose a new robust subspace clustering method which 
uses the correntropy induced metric as the loss function. 
The Frobenius norm is used to control the affinity matrix 
to preserve the grouping effect as in LSR. Then we mini¬ 
mize the non-convex correntropy objective of the proposed 
method by alternate minimization. 

1.3. Contributions and Organization 

We summarize the contributions of this work as follows: 


For LI-graph or SSC, C(X - XZ) = \ \X - XZ\\ 2 F and 
7 Z(Z) = ||Z||i. The motivation of using SSC is that the 
LI-minimization will lead to a sparse solution tending to be 
block diagonal. As pointed out in ED , the LI-minimization 
does not exhibit the grouping effect, and thus is weak to 
group correlated data points together. 

For LRR, C(X - XZ) = \\X - XZ\\ 2 i and K(Z) = 
||Z||*. It aims to find a low rank affinity matrix. When the 
data are drawn from independent subspaces, LRR leads to 
a bock diagonal solution which can recover the true sub¬ 
spaces. For the noisy case, LRR uses the robust L21-norm 
to remove outlier samples. 

MSR simply combines the criteria of SSC and LRR, 
C(X-XZ) = \\X-XZ\\ 2 1 and K{Z) = ||Z|| 1 + 7 ||Z||*. 
Thus MSR can be regarded as a tradeoff between SSC and 
LRR, but it needs to tune one more parameter 7 . 

The LSR method uses the Frobenius norm to model 
both the reconstruction error and the representation matrix, 
C(X - XZ) = \\X - XZ ||f, and K(Z) = \\Z\\ 2 F . LSR 
has a closed form solution which makes it efficient, and the 
grouping effect makes it effective for subspace clustering. 

The above methods share the common formulation as 
shown in <[T]). The Frobenius norm and L21 norm are used 
as the loss function while the LI norm, nuclear norm and 
Frobenius are used to control the affinity matrix. Different 
formulations require different solvers for these problems. 


• We propose a new robust subspace clustering method 
by Correntropy Induced L2 (CIL2) graph. It is able to 
handle data with non-Gaussian noises. We also extend 
CIL2 for handling data with outlier rows/features. 

• We apply the correntropy induced L2 graph for face 
clustering under various types of corruptions and oc¬ 
clusions. Extensive experiments demonstrate the ef¬ 
fectiveness of the proposed method by comparing it 
with the state-of-the-art methods. 


The remainder of this paper is organized as follows. Sec¬ 
tion 2 gives a brief review of the half-quadratic analysis and 
presents a general half-quadratic framework for robust sub¬ 
space clustering. Section 3 elaborates the proposed CIL2 
graph for robust subspace clustering. Section 4 provides 
experimental results on face clustering under different set¬ 
tings. We conclude this paper in Section 5. 


2. A General Half-Quadratic Framework for 
Robust Subspace Clustering 

For a given data matrix X £ R dxn , consider the follow¬ 
ing general problem: 


nun J(Z) = C(E) + A K(Z) 
s.t. E = X-XZ , 


( 2 ) 


Table 1. The popular previous subspace clustering models can be solved by half-quadratic minimization. 


Methods 

Objective 

min z C(X - XZ) + XJl(Z) 

£(■) 

Function 

77(9 

SSC 0] 

rmn z \\X-XZ\\p, + \\\Z\\ 1 

II • IIf 

till • 111 

LRR fnT 

min z \\X -XZ\\ 21 +\\\Z\\, 

II • II 21 

INI* 

MSR 1141 

minz \\X - XZ\\ 21 + A||Z||i + A 7 ||Z||« 

■ 21 

It ■ Mi +tII ■ II* 

lsr iTTT 

min z \\X - XZ\\p, + \\\Z\\l 

II ■ IIf 

\\-Wi 


where C(E) is the loss function chosen to be robust to out¬ 
liers or gross errors, and 7 Z(Z) is the regularization term. 
The loss function £(E) and regularization 7 Z(Z) may be 
non-quadratic. Thus it may be difficult to solve the problem 
0. But if £(E) and 7Z(Z) satisfy certain conditions, we 
can minimize J(Z) by half-quadratic analysis. 

In this work, we consider a general case of <j>(x) that 
satisfies the following conditions E3 


Using 0 on each Eij, the augmented function of J of 0 
is as follows 

J{Z, S) = + i’(Sij)) + AT Z(Z). (8) 

ij 

Based on the half-quadratic optimization, J(Z, S) can 
be minimized by the following alternate procedure: 


(3) 


(a) x —► 4>(x) is convex on R, 

(b) x — 1 - 4>{y/x) is concave on R + , 

(c) <f>(x) = (j>(—x),x G R, 

(d) <p{x) is C 1 on R, 

(e) ^"( 0 +) > 0 , 

(f) lim (p(x)/x 2 = 0 . 

x —>-oo 

Or in the matrix form <j)(M ): 

(a) M —> (j){M) is convex on R NxN , 

(b) M —» <fi(VM) is concave on §^ xAf , 

(c) (j)(M) = cj)(—M), M G R NxN , 

(d) is C 1 onR NxN , 

(e) <f>(M) is strictly convex on 0, 

(f) lim <t>{M)/\\M\\ 2 F = 0. 

M—>oo 

If (/)(■) satisfies all the conditions in ([3]), there exists 
dual function ip fT71 such that 


(4) 


cP(x) = ini{-sx +ip{s)}, 

Z 


(5) 


where s is determined by the minimizer function S(-) with 
respect to </>(•). <$(•) admits an explicit form under certain 
restrictive assumptions: 


s = S(t) = 



if t = 0 , 
iff ^ 0 . 


(6) 


If C(E) = Yhij (similar analysis can be per¬ 

formed on 1Z(Z)), problem 0 reads: 


mm 

z 


in J{Z) =Y / ‘P(E ij ) + \H(Z) 


(7) 


s.t. E = X- XZ. 


Sij = SiEij), ( 9 ) 

Z = argnqny^ ^ .S', ; + AT Z(Z). (10) 

ij 

The update sequence generated by the above scheme will 
converges. The objective function in 0 is nonincreasing 
under the update rules in ([9|>(jT0]> 8T71 . 

For LI norm, d>i(x) = \x\ = Vx 2 does not satisfy 
condition (d) in 0. We use cpi(x) = Vx 2 + e 2 as an 
approximation of \x\ with a small positive value e. It can 
be easily seen that \/x 2 + e 2 satisfies all the conditions in 
0 - We roughly say the LI norm satisfies all the condi¬ 
tions in 0 in this sense. Previous work 0 for solving the 
LI-minimization by iteratively reweighted least squares op¬ 
timization can be interpreted as the half-quadratic optimiza¬ 
tion in 0 and p} . For L21 norm, <f> 2 i(X) = ||X|| 2 i = 
J2i 11 -Z-i 112 ~ Sfdpilli + e) 5 , where e is a small positive 
value. It is easy to check that (p 2 i{x) = (x 2 + e)3 also sat¬ 
isfies all the conditions in 0. For nuclear norm, <j>* (X ) = 
Tt(X t X)^ « Tx(X t X + e/)i, where e is a small posi¬ 
tive value. It is easy to check that Tr(X T X + el) 3 satisfies 
the conditions (a)-(e) in 0. For the condition (f), the i -th 
singular value a, of X converges to infinity when X —j oc, 
and thus lim^oo /\\X\\ 2 F = lim^.^oo = 0. 

Therefore the nuclear norm also satisfies all the conditions 
in 0. The work lfl 6 l for solving low rank minimization 
by iteratively reweighted least squares minimization can be 
interpreted as the half-quadratic minimization. 

If both two functions satisfy all the conditions in 0, the 
sum of them also satisfies these conditions. The optimiza¬ 
tion method in lfl4l for minimizing 11 X \ | i + 7 11 X \ | * can be 
regarded as the half-quadratic optimization in 00 )}. 

Based on the above analysis, previous subspace clus¬ 
tering methods by using the LI norm, L21 norm and nu¬ 
clear norm can be optimized by the half-quadratic analy¬ 
sis on 00 )} by slightly relaxing the objective function. 
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Figure 2. Comparison of different loss functions. 


As shown in Table [T] previous subspace clustering meth¬ 
ods, including SSC, LRR, MSR and LSR, can be regarded 
as special cases of the problem Q from the view of half¬ 
quadratic analysis. Note that the Frobenius norm || ■ |||, 
does not need half-quadratic representation because it is al¬ 
ready quadratic. We also list it in Table |T| since it is widely 
used. 


3. Correntropy Induced L2 Graph for Robust 
Subspace Clustering 

3.1. Correntropy Induced Metric 

The mean squared errors (MSE) are probably the most 
widely used methodologies for quantifying how similar two 
random variables are. Successful engineering solutions 
from this methodology rely heavily on the Gaussianity and 
linearity assumptions. The work in 0 extended the concept 
of mean squared error adaptation to information theoretic 
learning (ITL) to include the information theoretic criteria. 
Then they further proposed the concept of correntropy to 
process non-Gaussian and impulsive noises fl2l . The cor¬ 
rentropy is a generalized similarity measure between two 
arbitrary scalar random variables u and v defined by 


V a (u,v) =E[k a (e)\, (11) 

where e = u — v, E[-] is the expectation operator, and 
kcr(-) is the kernel function. In this work we only consider 
the Gaussian kernel k a (e) = exp(—e 2 /2<r 2 ). In practice, 
we usually have only a finite number of data {(itj, Uj)}” =1 , 
which leads to the sample estimator of correntropy: 

1 " 

K (u,v) = - k,j(Ui - Vi). (12) 

n 

2=1 


Based on (12), Liu et al. ED extended the concept of 
correntropy criterion for a general similarity measurement 
between any two vectors, which is called the Correntropy 


Induced Metric (CIM). It is formally defined as 

1 n 

CIM(u, v) = (fc(0) — - VMg)) 1/2 , (13) 

n z ' 

i -1 


where e* = Ui — v t , for each i = 1, • • • ,n. 

Figure [^shows a comparison of the absolute error, mean 
squared error and CIM. The mean squared error is a global 
metric which increases quadratically for large errors. CIM 
is a local metric which is close to the absolute error when 
the errors are relatively small. For large errors, the value 
of CIM is close to 1. Note that the large errors are usually 
caused by outliers, but their effect on CIM is limited. There¬ 
fore CIM will be more robust to the non-Gaussian noises. 
The effectiveness and robustness of correntropy have been 
verified in face recognition ED , feature selection m and sig¬ 
nal processing El This paper uses this concept for robust 
subspace clustering. 

3.2. Correntropy Induced L2 Graph 

For robust subspace clustering, we use the correntropy to 
replace the Frobenius norm in the LSR model to model the 
reconstruction error, leading to the Correntropy Induced L2 
(CIL2) graph as follows: 


mj n E( 1 “ k A E ij)) + M\ z \\F 

i,j 

s.t. E — X — XZ. 


(14) 


It is easy to check that <f> a (x) = 1 — k a {x) = 1 — 
exp(— x 2 /2a 2 ) satisfies all the conditions in There¬ 
fore the above problem can be solved by the half-quadratic 
analysis. According to (|8), problem ( |T4} is equivalent to the 
following augmented objective function: 


J(Z,S) =J2^ e ^+^ s ^ + x \\z\\ 2 f 

ij 

s.t. E = X — XZ, 


(15) 


where ^(-) is the dual function corresponding to <j><,{•)■ We 
can minimize J(Z, S) in by the following alternate 
procedure: 


Sij = — exp(—D 2 /2cr 2 ) 


(16) 


Zi = argmin(X - XZ)f Diag(5 i )(X -12), + A| |^ |||. 

_ (17) 


Let X = Diag(\/Si)X, then problem (171 is also a least 
square regression model: 


rmnWX-XZiWl + XWZiWl 


(18) 


Since the kernel size a may affect the performance of 
the proposed model. It is usually determined empirically. 












In this study, the kernel size is computed as the average re¬ 
construction error, 

° 2 = wJ x ~ xz ^ (19) 

From ( [T5| or problem ( fj~8| >, we can see that the corren- 
tropy based LSR model can be regarded as a weighted LSR, 
where each weight Sij corresponding to E, :i is used to con¬ 
trol the effect of E u . 

3.3. Row Based Correntropy Induced L2 Graph 

In some real-world applications, the data may be oc¬ 
cluded with outlier rows/features. For example, some rows 
of the face images with sunglasses and scarf are outliers, 
which are not discriminative for classification and cluster¬ 
ing. In this case, we should measure the quality of the re¬ 
construction error based on the entire row. The effect of 
rows can be controlled by assigning different weights, and 
each element in the same row has the same weight. To 
this end, we have the row based Correntropy Induced L2 
(rCIL2) graph by solving the following problem 


mmJ2^-K(\\E%)) + X\\Z\\ 2 F 

i 

s.t. E = X-XZ. 


( 20 ) 



Figure 3. (a) Some corrupted face images from the Yale dataset, 
with 10%, 20% 30%, 50%, 70% and 90% of pixels corrupted, 
respectively; (b) Some face images with random block occlusion 
from the ORL dataset; (c) Some face images with 20% occlusion 
by monkey face from the AR dataset; (d) Some face images with 
contiguous occlusion by sunglasses and scarf from the AR dataset. 


Proposition 1 Given a data vector y £ R, data points 
X £ M. dxn , the weight vector w £ corresponding to 
each row of X, and a parameter A. Assume that each data 
point of X is normalized. Let z* be the optimal solution to 
the following weighted LSR (in vector form) problem: 

min || Diag(w)(y - Xz)\\ 2 2 + A||^|||. (25) 

Z 


According to the half-quadratic analysis, the above problem 
is equivalent to the following problem 


Jr(Z,w) = - X*Z\ 11 + V'K)) + A||Z|||.. 

( 21 ) 

Problem (211 can be solved by updating Z, w, and a alter¬ 
nately as follows: 


Wi = E exp(—(X i - X’Zf/2a 2 ), (22) 


Z = argminTr((X— XZ) T Diag(w)(X—XZ))+\\\Z\\ 2 Fl 

(23) 

fj2 = ^E \\ xi ~ xiz \\l (24) 


According to <0 and it is easy to prove that 

the sequences {J^Z 1 , S 1 ), t = 1, 2, • • • } in (151 and 
{JifZ 1 , «;*),< = 1, 2, • • • } in (211 converge. 


3.4. The Grouping Effect 


The CIL2 and rCIL2 graphs also use the L2 regulariza¬ 
tion as in LSR El. It is expected that they also have the 
grouping effect, i.e. the coefficients of a group of corre¬ 
lated data are approximately equal. The obtained solutions 
by CIL2 in ( fTT) and by rCIL2 in ( [23} are the weighted least 
square regression model which owns the grouping effect: 


We have 

H*? < 1 /oTT"^) 

\\w\\ 2 \\Diag(w)y\\ 2 ~ \ V [ 1 

where r = XjXj is the sample correlation. 


(26) 


We omit the proof of PropositionRlhere, it can be proved 
in the same way as the Theorem 7 in [J_3|. 

The mechanism of correntropy and the Proposition[T]en- 
sure that both CIL2 and rCIL2 are not only robust to noises 
but also preserve the grouping effect. 

3.5. Algorithm for Subspace Clustering 

Similar to the previous subspace clustering method LSR, 
which uses the representation coefficient matrix to construct 
the graph for clustering, we apply the learned solution Z* 
by CIL2 and rCIL2 to construct a graph with weights W = 
(\Z*\ + \Z* T \)/2, and then Normalized Cut Il20l is applied 
to cluster the data points into multiple clusters. 


4. Experiments 

4.1. Datasets and Settings 

Our experiments are performed on three face datasets: 
Yale, ORL and AR. Descriptions of these data sets are given 
as follows. 









Figure 4. Clustering accuracy and NMI on the Yale dataset with 
pixel corruption for different algorithms. 



Figure 5. Clustering accuracy and NMI on the ORL dataset with 
pixel corruption for different algorithms. 


The Yale face dataset contains 165 grayscale im¬ 
ages of 15 individuals. The images demonstrate variations 
in lighting condition and facial expression (normal, happy, 
sad, sleepy, surprised, and wink). The grayscale images are 
resized to a resolution of 32 x 32 pixels. 

The ORL face dataset |fl9l contains 400 images of 40 in¬ 
dividuals. Some images were captured at different times 
and have different variations including expression (open 
or closed eyes, smiling or non-smiling) and facial details 
(glasses or no glasses). The images were taken with a tol¬ 
erance for some tilting and rotation of the face up to 20 
degrees. Each image is resized to 32 x 32 pixels. 

The AR database Q3] consists of over 4,000 facial im¬ 
ages from 126 subjects. For each subject, 26 facial images 
are taken in two separate sessions. These images suffer dif¬ 
ferent facial variations, including various facial expressions 
(neutral, smile, anger, and scream), illumination variations 
(left light on, right light on, and all side lights on), and oc¬ 
clusion by sunglasses or scarf. We select a subset of the data 
set consisting of 50 male subjects and 50 female subjects. 
The grayscale images are resized to a resolution of 32 x 32 
pixels. 

4.2. Evaluation Metrics 

The clustering result is evaluated by the accuracy and 
normalized mutual information (NMI) metric as in l22l . For 
each data point Xi, let p, and y, be the obtained cluster label 
and the label provided by the ground truth, respectively. The 
accuracy is defined as follows: 


mutual information metric MI{C , C) is defined as follows: 


MI(C, C') 


^2 p( c i’ c 'j)l°92 
Ciec^ec 1 


P( C i> C 'j) 

p{ci)p{cj ')’ 


(28) 


where p(cj) and p(c') are the probabilities that a sample 
point arbitrarily selected from the data point belongs to the 
clusters c, and c', respectively, and p(ci,c' ) is the joint 
probability that the arbitrarily selected data point belongs 
to the clusters Ci as well as c' at the same time. We use the 
normalized mutual information (NMI) as follows: 


NMI{C , C) 


MI(C, C') 
max.(H{C),H(C')y 


(29) 


where H(C ) and H(C') are the entropies of C and C', re¬ 
spectively. It is easy to see that NMI(C, C) ranges from 0 
to 1. NMI = 1 if the two sets of clusters are identical, and 
NMI = 0 if the two sets are independent. 

4.3. Algorithm Settings 

We compare our rCIL2 and CIL2 graphs with several 
graph construction methods for subspace clustering, includ- 
ing the Ll-graph (3) (or SSC ffl), L2-graph (LSR) d, 
and LRR-graph iflOl . kNN and LLE lfl8l are also applied to 
construct graphs for subspace clustering. Kmeans is used as 
the baseline for comparison. The model parameters of these 
methods are searched from the candidate value sets and the 
best results are reported. 

4.4. Results under Random Pixel Corruption 


Accuracy = SL %<> (27) 

n 

where S(a, b) is the delta function that equals one if a = b 
and equals zero otherwise, and map(pi) is the permuta¬ 
tion mapping function that maps each cluster label p, to the 
equivalent label in y. 

Let C denote the set of clusters obtained from the ground 
truth and C' obtained by the segmentation method. Their 


In some practical scenarios, the face images may be par¬ 
tially corrupted. We evaluate the algorithmic robustness on 
the Yale and ORL face datasets. Each image is corrupted 
by replacing a percentage of randomly chosen pixels with 
i.i.d. samples from a uniform distribution (uniform on [0, 
255]). The corrupted pixels are randomly chosen for each 
image, and the locations are unknown. We vary the per¬ 
centage r of corrupted pixels from 10% to 100%. Figure 
[3] (a) shows some examples of those corruptions. To the 
human eyes, beyond 50% corruption, the corrupted images 


































Figure 6. Clustering accuracy and NMI on the Yale dataset with 
block occlusion for different algorithms. 



Figure 7. Clustering accuracy and NMI on the ORL dataset with 
block occlusion for different algorithms. 


are barely recognizable as face images. Since the images 
are with random corruption, we repeat the experiments for 
20 times for each r, and the means of accuracy and NMI are 
reported for evaluation. 

Figures [4] and [5] show the means of clustering accuracy 
and NMI of different methods as functions of the corrup¬ 
tion level. It can be found that both the accuracy and NMI 
decrease when more pixels of each image are randomly cor¬ 
rupted. Our proposed CIL2 and rCIL2 outperform the com¬ 
pared methods in most cases. In particular, the CIL2 usually 
performs better than rCIL2 when the percentage of the cor¬ 
rupted pixels is no more than 50% on the Yale dataset and 
70% on the ORL dataset. This is because each row of im¬ 
ages may not be regarded as outliers when the level of the 
random pixel corruption is low. LRR and L2-graph perform 
competitively on both datasets, which also verifies the ef¬ 
fectiveness of the grouping effect of these two methods for 
subspace clustering. When the images are with high per¬ 
centage of pixel corruptions, none of the compared methods 
perform well due to insufficient discriminative information. 

4.5. Results under Contiguous Occlusion 

In this subsection we simulate various types of contigu¬ 
ous occlusions by replacing a randomly selected local re¬ 
gion in some randomly selected images with a black-white 
square and an unrelated monkey image. 

The first experiment is conducted on the Yale and ORL 
datasets with random block occlusion. Figure [3] (b) shows 
some face images with such black-white occlusions, in size 
of 8 x 8 pixels. In each dataset, we select r percentage of 



Accuracy NMI 


Figure 8. Clustering accuracy and NMI on the AR dataset with an 
unrelated image occlusion for different algorithms. 


the images for occlusion, with r varying from 10% to 100%. 
The experiments are repeated 20 times for each r, and the 
means of accuracy and NMI are reported for evaluation. 

Figures [6] and [7] show the means of clustering accuracy 
and NMI of each method on different percentages of cor¬ 
rupted images. CIL2-graph achieves the best accuracy and 
NMI on both Yale and ORL datasets in all cases. Com¬ 
pared with previous subspace clustering methods, the im¬ 
provement by rCIL2 is still limited. The phenomenon is 
similar to the random pixel corruption scenario, since the 
images with block occlusion will not lead to outlier rows. 
rCIL2-graph will not be very effective in this case. Notice 
that in this experiment, r percentage of the images in each 
dataset is selected to be occluded with a size of 8 x 8 block, 
and thus the decreasing curves of the clustering accuracy 
and NMI are flatter than those in Figures [4] and [5] 

The second experiment is conducted on a subset of AR 
dataset. This subset consists of 1,400 images from 100 sub¬ 
jects, 50 males and 50 females. These images are of non- 
occluded frontal views with various facial expressions in 
Sessions 1 and 2. For each image, we randomly select a 
local region to be replaced by an unrelated monkey image. 
The size of monkey image is 14 x 14, i.e. about 20% pixels 
of each image are occluded. Figure [3] (c) shows some face 
images with such unrelated image occlusions. 

Figure[8]shows the clustering accuracy and NMI of each 
method on the AR dataset with unrelated monkey image 
occlusion. The experimental results are similar to the above 
experiment. Still, CIL2 obtains the best results, and rCIL2, 
LRR and L2-graph are competitive on this experiment. 

4.6. Results on Real-World Malicious Occlusion 

In real-world face recognition systems, people may wear 
sunglasses or scarfs which make the classification or clus¬ 
tering more challenging. In this subsection, we evaluate the 
robustness of the proposed method on the AR dataset with 












































































Table 2. The clustering accuracy (%) and NMI (%) of different algorithms on the AR dataset. 


Accuracy 

NMI 

Methods 

Session 1 

Session 2 

Session 1 

Session 2 


Sunglasses 

Scarf 

Sunglasses 

Scarf 

Sunglasses 

Scarf 

Sunglasses 

Scarf 

rCIL2 

85.2 

78.4 

86.4 

81.2 

93.8 

90.1 

94.0 

91.5 

CIL2 

81.2 

75.4 

85.4 

79.0 

89.9 

87.9 

93.8 

88.8 

L2 

78.2 

71.6 

80.0 

72.6 

86.3 

83.8 

90.7 

83.8 

LRR 

77.2 

72.2 

79.6 

74.6 

86.6 

84.7 

90.7 

84.7 

LI 

43.8 

40.6 

27.8 

40.2 

72.3 

67.7 

55.8 

60.8 

kNN 

26.4 

25.6 

26.8 

27.2 

65.4 

66.0 

66.1 

65.9 

LLE 

28.0 

27.6 

33.2 

27.2 

63.4 

62.9 

66.6 

61.7 

kmeans 

30.0 

29.4 

30.8 

29.8 

65.4 

63.8 

65.3 

65.5 


sunglasses and scarf occlusions. The AR dataset contains 
two separate sessions. In each session, each subject has 7 
face images with different facial variations, 3 face images 
with sunglasses occlusion and 3 face images with scarf oc¬ 
clusion. Figure [3] (d) shows some face images with such 
an occlusion. In each session, we conduct two experiments 
corresponding to the sunglasses and scarf occlusions. For 
sunglasses occlusion, we use the first 2 normal face images 
and 3 face images with sunglasses of each subject. For scarf 
occlusion, we use the first 2 normal face images and 3 face 
images with scarf of each subject. 

Table [2] shows the clustering results on the AR dataset 
for the images with sunglasses and scarf occlusions. Dif¬ 
ferent from the above experiments, rCIL2 achieves the best 
clustering accuracy and NMI in all cases. That is because 
the face images with sunglasses and scarf occlusions con¬ 
tain many outlier rows/features, and rCIL2 is designed for 
such a task. Both LRR and L2 graphs perform better than 
LI graph, which is consistent with the result in |fT0llT3l . 

5. Conclusions 

In this paper, we study the robust subspace clustering 
problem, and present a general framework from the view¬ 
point of half-quadratic optimization to unify the LI norm, 
Frobenius norm, L21 norm and nuclear norm based sub¬ 
space clustering methods. Previous iteratively re weighted 
least squares optimization methods for the sparse and low 
rank minimization can be regarded as the half-quadratic op¬ 
timization. As a new special case, we use the correntropy 
as the loss function for robust subspace clustering to han¬ 
dle the non-Gaussian and impulsive noises. An alternate 
minimization algorithm is used to optimize the non-convex 
correntropy objective. Extensive experiments on the face 
clustering with various types of corruptions and occlusions 
well demonstrate the effectiveness and robustness of the 
proposed methods by comparing with the state-of-the-art 
subspace clustering methods. 
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